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Abstract 

A  table-driven,  syntax-directed  scheme  for  automatic 
error  recovery  during  program  compilation  is  described.  It 
is  designed  to  be  used  in  a  table-driven  LALH  parser  which 
can  form  tne  nucleus  of  a  language  translator.  After  the 
underlying  theory  is  reviewed,  the  implementation  of  the 
error  recovery  scheme  is  discussed-  Examples  of  its 
performance  are  given  and  comparisons  are  made  with  some 
other  methods.  It  is  concluded  that  it  would  be  worth 
including  in  a  compiler  writing  system  for 
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Chapter  1. 


n  troduct ion 


All  compilers  must,  if  they  are  to  be  tools  of 
programmers,  rather  than  obstacles,  be  able  to  detect  and 
recover  from  syntactic  errors  in  the  programs  they  process. 

Detection  is  the  discovery  of  constructs  which  are  not 
allowed  according  to  the  syntactic  definition  of  the 
language.  The  detection  process  is  absolutely  essential, 
and  must  be  able  to 

1.  locate  the  error,  and 

2.  inform  the  programmer  precisely  what  the  error  is 
(note  that  we  say  infer®  and  net  just  "print  a 
message") . 


Implementation  of  error  detection  is  usually  fairly 
easy,  and  rs  particularly  straightforward  if  a  syn tax- 
EiiifE  is  used  to  analyze  the  program.  Such 
parsers,  however,  may  not  always  detect  the  error  as  soon  as 
the  illegal  construct  is  scanned.  Simple  precedence  parsers 
have  this  problem  [Lei  70].  Others,  such  as  LB  (k)  parsers, 
do  detect  errors  as  soon  as  a  symbol  read  is  illegal  in  the 
context  of  the  parse  to  that  point. 
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Hecovory  is  as  desirable  a  process  as  detection.  Given 
that  an  error  has  been  detected,  a  recovery  procedure  should 
be  able  to 

1.  make  appropriate  replacements,  insertions,  or 
deletions  to  allow  compilation  to  continue, 

2.  the  programmer  exactly  what  these  changes 
are,  and 

3.  tell  the  compiler  how  and  where  to  resume 
con;  pi  la t  ion. 

It  is  easy  to  see  why  every  effort  must  be  made  to 
enable  the  resumption  of  compilation.  A  good  error  recovery 
system  can  consistently  reduce  the  number  of  runs  needed  to 
eliminate  all  syntactic  errors  in  programs. 

As  subjoals  of  point  1  above,  an  error  recovery  routine 
should  try  to 


1. 

find  all 

errors  during 

t  h  e 

parse,  that 

is. 

recovery 

attempts  on 

one 

error  should 

not 

ad v  ersel y 

affect  detection 

Ci  / 

or  recovery 

from , 

a  following  one. 

2.  prevent  an  '’avalanche”  of  errors  being  generated 
by  a  poor  recovery  attempt. 

A  syntax-directed  error  recovery  scheme  is  one  which 
depends  upon  the  formal  definition  of  the  language  syntax 
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rather  than  upon  ad  hoc  heuristics-  If  such  a  scheme  is 
integrated  with  a  syntax- directed  parser,  then  recovery  is 
likely  to  be  rather  difficult  to  do  well;  we  must  do  without 
the  semantic  information  an  error  recovery  scheme  written 
for  a  particular  language  would  have-  However,  in  using 
such  a  syntax-directed  system,  we  gain  the  portability  of 
language  independence:  once  designed,  the  system  could  be 
installed  in  several  compilers  without  alteration.  For  the 
compiler  writer,  this  is  an  extremely  valuable  feature,  as 
it  allows  him  to  concentrate  his  initial  efforts  on  aspects 
of  program  compilation  which  cannot  be  automated  as  easily 
or  as  wall. 

This  thesis  will  be  concerned  with  a  table-driven 
syntax-directed  error  recovery  algorithm  which  was  embedded 
in  a  table-driven  syntax-directed  parser, 

LALR(1)  parser  [Lai  71]. 


in  particular,  an 
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ChdD ter_ 2. _ Theory 

In  this  chapter  we  review  the  basic  notation  and 
terminology  we  shall  require;  then  define  LR  (k)  grammars, 
parsers,  and  LALR(k)  grammars,  and  finally  introduce  error 
recovery. 

2.  1 _ Preliminary  Notation  and  Terminology 

A  vocabulary  V  is  the  union  of  two  disjoint  sets  of 
symbols,  the  nontenn  ina 1  symbols  Vn  and  the  termina 1  symbols 
Vt.  W*,  where  W  is  any  set  of  symbols,  is  the  set  of  all 
finite-length  strings  over  W  including  the  null  string.  We 
shall  use  uppercase  letters  (A,  B,  €,  ...)  for  symbols  in 
Vn,  lowercase  letters  from  the  beginning  of  the  alphabet  (a, 
b,  c,  ...)  for  symbols  in  Vt,  lowercase  letters  from  the  end 
of  the  alphabet  (u,  v,  w,  .  ..)  for  strings  in  Vt*,  and 
lowercase  Greek  letters  {<*,  jf,  for  strings  in  V*. 

Let  jt*  1  denote  the  length  of  and  k:<x  the  first  k  symbols 
of  if  |c<  j  > k  and  otherwise.  We  shall  assume  the  usual 
definition  of  oarse  ([  Kor  69],  [Lei  7C]). 

Several  definitions  of  LR(k)  grammars  have  been  given 
([Knu  6  5],  [HSU  69],  [CSS  70]).  We  shall  use  that  of 
Reremer  [DeR  69]  as  well  as  following  his  terminology  fairly 
closel y . 
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A  context-free  (CF)  grammar  is  a  quadruple 
G=  {Vt, Vn, S,P)  where  S  is  a  particular  member  of  Vn  called 
the  goal  symbol  and  P  is  a  finite  set  of  productions/  each 
of  which  is  written  as  A  ->'w  where  A  is  the  left  part  and 
to  is  the  rig.ht  part,  Without  loss  of  generality  we  can 
assume  that  the  productions  are  numbered  from  1  and  that  the 
first  production  is  of  the  form  S  ->  |-  S*  -|  ,  where  S'  is 
a  subordinate  goal  symbol  and  J-  and  -|  are  some  symbols  in 
Vt  not  otherwise  occurring  in  P. 

2i2__LHJkL_3  r am mars 

Let  h  ->  uj  be  a  production.  We  write  <x *  ->  to  mean 
an  immediate  derivation  of  a  string  o<  =  p from  another 
oi'  =  A  right  derivation  is  written  cy1  ->*  <*,  which 
means  there  exist  strings  <*0 ,  ©<1 ,  .  .  .<*n  such  that 

ex’  =<^0-  >  «*1  -  ><*:n  =<x 

for  n>0,  and  where,  for  i=1,2,...,n,  the  rightmost 
nonterminal  an  °<i-1  is  used  to  derive  <*i,  A  canonical  form 
is  any  string  right  derivable  from  S. 

Note  that  a  right  derivation  is  a  right-to-left  process 
since  we  always  replace  the  rightmost  nonterminal.  Since  a 
parse  is  the  reverse  of  a  derivation,  it  proceeds  from  left 
to  right. 
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Let  3  be  a  CF  grammar  with  s  pro 
{#  1 ,  #2 ,  .  .  .  ,  it;}  be  a  set  of  symbols  not  in  V 
associated  with  production  1,  #2  with  2,  .. 

Let  the  p’th  production  be  A  ->  w ,  and  let 
o<  =  f>  uj p,  be  canonical  forms  such  that  tlier 
derivation  3  ->*  o*'  ->  .  Then  is  a 

string  of  °<. 


ductions  and  let 
such  that  #1  is 
,  and  #s  with  s. 
a;*  =  /°  h/3  and 

e  exists  a  right 
characteristic 


Let  k  be  a  non-negative 
LR  (k)  iff  every  canonical  form  < 
a  unigue  characteristic  string 
by  investigating  only  and  k:/3. 
DeRemer,  is  a  reformulation  of 
Knu th  [ Knu  65 ] . 


integer.  A  CF 
=  <P/?>  of  G,  exc 
<(fep  which  can 
This  definit 
the  original 


gra 

mmar 

G 

is 

ept 

c*  =  S, 

has 

be  d 

eter 

mi 

ned 

ion , 

du 

e 

to 

def  i 

niti 

on 

by 

2.3 _ Parsers 


The  parser  generator  system 
produces  an  LALR  (k)  parser  by  f 
parser  and  then  resolving  all  i 
lookahead  ol  no  more  than  k  symbols 
limit  the  following  discussion  tc  LR 
extend  it  to  LALR (k)  . 


we  shall  use  [Lai  71] 

irst  generating  an 

LR  (0) 

nadeguate  states 

by  a 

.  Accordingly,  we 

shall 

(0)  grammars  and 

then 

We  shall  use 
purposes: 


the  following  grammar  for  illustrative 


G  1 


»  J}» 


{S,E,T},  S, 


PI} 
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where  PI  is  the  following  set  of  productions: 

1  S  ->  1-  E  -1 

2  E  -  >  -T  +  E 

3  S  ->  T 

4  T  ->  i 

5  T  ->  (  E  ) 


sets.  Each 

raembe 

r  of 

production 

which 

is  a 

. ”  in  its  right 

part. 

{S  ->  .|-  S' 

• 

-I}- 

called  a  "state"  of  the 

certain  state. 

the 

iguration  set  indicate 

point,  that 

is. 

they 

r  may  have  reached 

that 

contains 

only 

the 

arker  to  the 

left 

of  j- 

read  must  be 

1  -  • 

et  has  at 

least 

one 

ion  set.  The 

successor 

sor  and  contains 

the 

of  SO  is  called  a  | — succ 

configuration  S  ->  J-.5*  - | ,  among  others.  In  general,  a 
configuration  set  Si  will  have  a  s-successor  for  each  symbol 
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s  for  which  there  is  a  marked  production  in  Si  with  the 
marker  just  to  the  left  of  s;  each  such  s-successor  will 
contain  ali  such  marked  productions  with  the  mark  moved  one 
symbol  to  the  right. 

The  remaining  elements  of  the  s-successor  are  found 
recursively  by  determining  the  set  of  configurations  of  the 
form  A  ->  .w  such  that  A  ->  w  is  in  P  and  there  exists  a 
con f igurat ion  with  a  marker  before  an  A  in  the  s-successor. 
The  s-successor  now  consists  of  all  rules  which  the  parser 
might  begin  to  process,  in  addition  to  those  it  is  already 
processing!  We  must  add  these  new  configurations  to  form 
the  new  state  because  if  a  configuration  has  a  marker  before 
a  nonterminal,  we  must  also  have  all  the  productions 
defining  that  nonterminal  in  the  set,  because  we  are  parsing 
strings  containing  terminal  symbols. 


In 

th 

e  sp 
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case 
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wit 
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and  adding  to  the  set  all  successors  cf  members  already  in 
the  set. 


For  example,  we  could  begin  construction  of  an  IR  (0) 
parser  for  grammar  G1  by  first  writing  down  the  initial 
configuration  set  SO  =  (S  ->  .|-  E  - | } -  This  set  has  one 
successor,  namely  a  | — successor,  one  member  of  which  is  the 
marked  production  S  ->  j-.E  -j.  Completing  the  successor, 
we  must  add  the  marked  productions 

E  -  >  .  T  +  E 
E  ->  .T 
T  ->  .  i 
T  ->  .(E) 

So  SI  =  {S  ->  J-.E  -J,  E  ->  .T  +  E,  E  ->  -T,  T  ->  .i, 
T  ->  .  (  S)  }  . 


We  note  at  this  time  that  the  parser  generator  which  we 
use  and  which  is  described  in  section  4.1.1  does  not  require 
the  left  endmarker  j-  and  therefore  the  initial 
configuration  set  becomes  51. 


2.4 _ LALR (k) _ Grammars  anda  Parsing  Algorithm 

Corresponding  to  the  LR  (0)  parser  is  a  finite  state 
machine  (FSM) .  Each  state  in  the  F5M  corresponds  to  a 
configuration  set;  the  transitions  of  the  FSM  correspond  to 
the  successor  relations;  the  final  state  corresponds  to  the 
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state  configuration  set  successor  relation 

number 


0 

{S  ->  .  1-  E  -1} 

i- 

1 

1 

{S  ->  l-.E  -| 

E 

2 

E  -  >.  T  +  E 

T 

4 

E  -  >.  T 

T 

4 

T  ->.  i 

i 

5 

T  ->.  (  E  )} 

( 

6 

2 

{S  ->  I-  E.-U 

-1 

3 

3 

[S  ->  1-  E  -].} 

*  1 

1  1 

4 

{E  ->  T.t  E 

+ 

7 

E  ->  T.} 

#3 

1  1 

5 

(T  ->  i.} 

#4 

1  1 

6 

fT  ->  (.E  ) 

E 

8 

E  T  +  E 

T 

4 

E  ->.  T 

T 

4 

T  ->.  i 

i 

5 

T  ->.  {  E  )  } 

( 

6 

7 

{E  ->  T  +.E 

E 

9 

E  ->.  T  +  E 

T 

4 

F  ->.T 

T 

4 

T  ->.  i 

i 

5 

T  ->.  (  E  )} 

( 

6 

8 

{T  ->  (  E.)} 

) 

10 

9 

[E  ->  T  +  E.} 

#2 

1  1 

10 

{T  -  >  (  E  )  « } 

#5 

1  1 

1 1 

0 

Fig.  2.  1 

An  LR  ( 0 )  Parser  for  Grammar  G1 


-11- 


empty  configuration  set.  We  call  the  FSM  the  characteristic 
FSM  (CFSM)  of  G. 

A.  CFSH  state  with  transitions  under  symbols  in  V  only 
is  called  a  read  state.  A  state  with  one  transition  under 
a  #n  symbol  and  no  transitions  under  terminal  symbols  is 
called  a  reduce  state.  Any  other  state  (excluding  the  final 
state)  is  called  an  inadequate  state.  In  this  case  there  is 
not  enough  information  available  to  the  FSM  for  it  to  decide 
which  transition  to  make. 


A  context  free 

grammar  is  LR(0) 

if 

and  only  if 

its  CFSM 

no  inadequate 

states  (Knuth) . 

A 

grammar  is 

LALR  (k) 

kahead  LR  (k) ) 

if  all  its  in 

adequate  states 

can  be 

resolved  by  a  local  lookahead  of  not  more  than  k  symbols. 
Thus  our  grammar  G1  is  not  LR  (0) .  However,  it  is  LALR{1) 
since  the  inadequate  state  (state  4)  can  be  resolved  by  a 
local  lookahead  of  1  symbol.  We  thus  introduce  a  lookahead 
state  which  decides  which  transition  to  perform  in  the  place 
of  an  inadequate  state.  In  our  grammar  Gl,  then,  we  replace 
state  4  with  a  lookahead  state  and  add  a  read  state  and  a 
reduce  state.  State  4  then  becomes 

4  {E  ->  T. +  E  +12 

E  ->  T.}  {"!,)}  13 

and  the  new  states  are 


+ 


12 


(E  ->  T.+  E} 


7 
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13  {E  ->  T.}  #311 

The  following  algorithm  will  parse  a  string  in  the 
language  generated  by  an  LALH(1)  grammar  G.  We  have  a  stack 
on  which  we  store  states  entered  by  the  CFSM;  St  will  always 
refer  to  the  state  on  top  of  the  stack.  u  is  a  string  in 
the  language.  Let  w=u. 

0.  Place  state  SO  on  the  stack. 

1.  If  St  is  a  reduce  state,  go  to  step  3.  If  St  is 
a  lookahead  state,  go  to  step  4. 

2.  (read  state)  Read  a  symbol  from  w,  set  w  to  the 
suffix  of  w  not  yet  read,  place  the  corresponding 
state  on  the  stack  and  gc  to  step  1. 

3.  (reduce  state)  Let  the  production  associated  with 

the  reduce  state  be  A  ->  to.  Pop  the  top  jtuj  items 
off  the  stack.  If  A=S,  then  the  parse  is 

complete,  so  step;  otherwise  place  the  state  which 
is  the  A-successor  of  state  St.  on  the  stack,  and 
go  to  step  1. 

4.  (lookahead  state)  Read  a  symbol  from  w  (but  dc  not 
reset  w) .  Pop  one  entry  off  the  stack,  place  the 
state  corresponding  to  the  symbol  read  on  the 
stack  and  go  to  step  1. 
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2, 5 _ Error  Recovery 

It  has  been  shown  [DeR  69]  that  when  the  CFSM  is 
presented  with  a  string  n*  not  in  the  language,  it  will 
always  detect  the  error  as  a  symbol  fc  in  u’  such  that  there 
is  no  transition  under  b  from  the  state  which  it  is  in.  We 
now  come  to  the  problem  of  what  to  do  in  order  to  continue 
the  parse  if  we  are  faced  with  such  a  string. 


We  shall  use  the  approach  suggested  by  Leinius 
[Lei  70]-  He  discussed  automatic  error  recovery  for  simple 
precedence  languages  [ WSW  66]  and  then  proposed  a  similar 
technique  for  LR(1)  languages.  He  bases  his  error  recovery 
method  on  the  phrase  structure  of  a  string  and  thus 
generates  a  .khjrase-level  error  recovery  algorithm.  First, 
the  segment  of  the  program  which  contains  the  error  and 
which  is  a  potential  phrase  is  isolated.  Then  a  list  is 
compiled  of  all  possible  nonterminals  to  which  the  segment 
could  be  reduced  but  which,  if  they  were  to  replace  the 
segment,  would  retain  syntactic  consistency.  Finally,  one 
of  these  nonterminals  is  selected  to  replace  the  segment. 
If  no  such  nonterminal  exists,  or  if  more  than  one  such 
nonterminal  could  be  selected,  a  new  potential  phrase  is 
determined,  and  the  attempt  is  repeated.  This  procedure 


will  always  terminate  because  the  largest  possible  potential 
phrase  can  always  be  reduced  to  the  goal  symbol. 
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Recall  that  in  the  LAL R  ( 1 )  system,  a  state  with  a 
transition  under  a  symbol  in  Vn  (in  the  parser,  a 

configuration  with  the  symbol  in  Vn  to  the  right  of  the 

marker)  did  not  mean  that  the  nonterminal  must  appear  in  u, 
but  that  the  parser  could,  at  this  point,  begin  to  parse 
that  nonterminal.  In  terms  of  an  LALR  parser,  then,  the 
left  end  of  a  potential  phrase  becomes  just  such  a  state 
which  is  also  in  the  stack,  and  the  nonterminal  to  which  the 
segment  can  be  reduced  is  precisely  the  nonterminal  to  the 
right  of  the  marker.  On  encountering  an  illegal  read 

symbol,  then,  we  can  pop  the  state  stack  until  a  state  with 
at  least  one  such  transition  is  on  the  top  of  the  stack,  and 
which  is  associated  with  a  nonterminal  which  can  preceed 
some  terminal  in  u'.  We  then  discard  symbols  from  u  until 
ve  encounter  the  terminal  which  can  legally  follow  the 
nonterminal.  Finally  the  state  which  corresponds  to  the 

nonterminal  is  placed  on  top  of  the  stack.  The  parse  can 
then  resume  for  at  least  one  step  (before  encountering 
possible  further  errors). 

As  an  example,  consider  once  again  our  grammar  G1,  Let 
us  represent  the  present  state  of  the  parse  by  placing  the 
stack  elements  to  the  left  of  a  vertical  bar,  and  the 
remainder  of  the  input  string  to  the  right,  of  the  bar.  Thus 
SO  SI  S2  |  v 
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indicates  that  we  are  presently  in  state  S2  and  have  not  yet 
read  v.  Assume  we  are  to  parse  the  string  J-  {  i  }  )  +  i 
-|.  Note  that  the  string  is  net  in  the  language  generated 
by  G 1 .  The  parse  will  proceed  as  follows: 
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this 

stage. 

read 

state 

2 ,  we 

discover 

not  in  the  set  of  legal  read  symbols,  namely  {-!}.  State  SI 
is  the  first  state  on  the  stack  with  transitions  under 
nonterminals,  namely  T  and  E.  Tf  we  pop  one  state  from  the 
stack  (i.e.,  replace  the  E)  and  discard  the  are 
then  left  with  a  legal  read  symbol  (" ♦"  can  be  read  from 
state  S 4 )  and  we  can  continue  the  parse. 


This  procedure  will  always  be  able  to  resume  the  parse 
because,  if  necessary,  the  stack  could  be  popped  to  the 
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state  with  transition  under  the  subordinate  goal  symbol  with 
r'-j"  as  the  next  symbol. 

It  should  be  observed  that  while  this  algorithm  is 
actually  a  straight  replacement  algorithm,  it  will  often 
achieve  the  effect  of  an  algorithm  based  on  insertion  of 
symbols.  This  is  so  because  the  nonterminal  which  we  use 
for  recovery  may  contain  seme  of  the  symbols  which  we  are 
deleting  from  the  stack.  For  example,  a  PL/I-like  language 
may  contain  the  production 

<statement>  ->  <assignment  statement>  ; 

If  the  state  corresponding  to  <ass ign men t  statement>  is  on 
the  stack  and  we  pop  it,  replacing  it  with  the  state 
corresponding  to  <sta temen t> ,  we  have  in  effect  inserted  a 
semi-colon. 

So  far,  the  recovery  algorithm  as  described  is 
completely  automatic.  In  practice,  however,  there  may  not 
be  a  unique  action  which  leads  to  recovery.  A  state  may 
have  transitions  under  more  than  one  nonterminal  which  can 
"read"  a  terminal.  In  addition,  there  may  well  be  a  state 
not  far  down  the  stack  which  can  read  a  terminal  which  may 
be  much  farther  along  in  the  string  u  and  at  the  same  time 
a  state  farther  down  the  stack  which  can  read  one  of  the 
first  few  terminals  in  u.  Recovery  using  the  former  state 
will  involve  discarding  a  large  portion  of  the  input  string. 
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thereby  impairing  the  intended  program.  Recovery  using  the 
latter  state  may  force  the  parser  to  give  up  much  of  the 
information  it  may  have  built  up  about  the  structure  of  the 
string-  This  problem  will  be  discussed  more  fully  in  a 
later  section- 
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Chapter  3. _ Other  Work  in  the  Field 

Several  syntax-directed  translating  techniques  exist 
but  most  have  used  error  recovery  based  on  a  particular 
language  ana  hence  are  cf  no  interest  to  us  ([ McK  70];  the 
systems  of  Dean  and  Schneider  are  discussed  in  [Lei  70]). 
Cornell's  BL/C  compiler  [Con  70]  lies  in  this  category  but 
includes  spelling  correction  for  keywords,  a  feature  of  the 
error  recovery  system  described  in  this  thesis.  Leinius  has 
generated  a  fairly  comprehensive  language- independent  error 
recovery  algorithm  for  simple  precedence  grammars  which  we 
outlined  in  section  2-5. 

One  of  the  most  complete  syntax-directed  error  recovery 
algorithms  published  to  date  is  that  of  Irons  [ Tro  63]-  His 
recovery  algorithm  works  with  a  parsing  algorithm  which 
parses  strings  describable  in  a  BNF-like  language  without 
BNF’s  recursive  power  but  with  an  added  iterative 
capability.  When  faced  with  seme  terminal  string,  the 
algorithm  carries  along  all  possible  parses. 

A.  "chain  table"  is  built  up  for  each  terminal  symbol 
consisting  of  nonterminals  which  may  be  begun  by  the 
terminal.  klsc  constructed  is  a  "syntax  tree"  containing 
"syntax  pointers"  and  elements  in  the  vocabulary;  this  tree 
indicates  alternative  parses  and  possible  successor  symbols. 
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An  error  in  the  terminal  string  is  detected  when  no 
parses  can  he  continued;  this  occurs  at  or  shortly  after  the 
error-  The  error  recovery  will  then  proceed  as  follows: 

1.  Taking  into  account  the  parses  just  before  the 

error  point  and  the  syntax  tree,  a  list  is 
compiled  of  all  terminals  or  nonterminals  which 
can  be  called  for  after  the  error  point- 

2,  The  symbols  at  and  after  the  error  point  are 

examined  one  by  one  and  discarded  until  one  is 
found  which  either 

a)  occurs  on  the  list  cf  1 ,  or 

b)  has  an  element  on  its  chain  which  occurs 
on  the  list  of  1- 

3-  The  production  from  1  which  is  selected  in  2  is 
examined  in  relation  to  the  parses  to  determine  a 
string  of  terminals  which,  when  inserted  at  the 

error  point,  will  allow  the  parse  to  continue  at 


least  one  s 
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past 

the 

inserted  string 
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is  given  as 

to  how 

this 
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The  string 
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error 
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continued. 


Irons  claimed  his  algorithm  was  efficient  with  respect 
to  time  as  well  as  memory  space,  although  no  figures  were 
given  for  any  particular  machine-  As  Leinius  [Lei  70] 
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points  oat,  the  order  in  which  the  list  in  1  is  compiled  may 
possibly  afreet  the  recovery  and  there  may  well  be  more  than 
one  string  determined  by  .2  which  could  allow  parsing  to 
continue-  Irons  makes  no  mention  of  hew  to  resolve  these 
potential  difficulties. 

fiore  recently,  an  interesting  and  as  yet  untested  error 
recovery  scheme  has  been  proposed  by  Gries  for  bottem-up 
parsers  [Gri  71].  Its  main  interest  to  us  lies  in  a 
comparison  with  the  one  described  in  the  rest  of  this 
thesis.  Gries  feels  that  a  recovery  algorithm  should  not 
alter  symbols  already  read  by  the  parser,  as  this  would 
involve  undoing  any  semantics  associated  with  them,  a  fairly 
difficult  task.  He  claims  that  the  effect  of  deleting  or 
inserting  symbols  at  the  top  cf  the  stack  can  be  achieved  by 
merely  inserting  a  terminal  string  immediately  before  the 
error  point  (he  assumes  the  parsing  method  detects  an  error 
immediately  upon  reading  the  incorrect  symbol) .  We  now 
describe  a is  criteria  for  altering  the  input  stream. 

A  set  of  error  recovery  routines  is  built  up  by  a 
constructor  which  accepts  the  grammar  as  input.  Let  be 
the  part  of  the  program  already  processed  (the  symbols  in 
the  stack)  and  let  =  be  such  that  /3  forms  the  first  part 
of  the  right  part  of  some  production  (there  may  well  be  more 
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than  one  such  ft>) .  Suppose  t,  the  next  terminal  to  be 
scanned,  is  illegal. 

1.  If  there  is  a  production  A  ->  in  the  grammar, 

a  terminal  string  u  should  be  inserted  such  that 
lu  ->*  u. 

2.  If  there  is  a  production  A  ->  ft such  that 
B  ->*  wtcf,  a  terminal  string  u  should  be  inserted 
such  that  U)  ->*  u. 

3.  If  there  exists  a  production  A  ->  (flBt^  such  that 
B  ->*  & 2CUJ2  and  C  ->  ftu)l  is  a  production,  a 
terminal  string  u  should  be  inserted  such  that 
ou  =  um2  ->*  u. 

4.  If  none  of  the  above  apply,  t  should  be  deleted. 

Once  again  left  unanswered  are  the  problems  of  what  to 
do  if  more  than  one  production  applies  or  more  than  one 
terminal  string  is  applicable. 
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Chapter  4. _ I mplementa tion 


4.1  Local  Environment 


The 
programs 
to  parse 


recovery 
one  a 
programs 


algor  it 
program 
written 


hm  consists 
producing 
in  the  cor 


of  mod 
tables 
respond 


if icatio 
used  by 
ing  lang 


ns  to  two 
the  other 
uage. 
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If  there  are  inadequate  states,  then  an  attempt  is  made 
to  resolve  them  by  inserting  lookahead  states-  The  degree 
k  of  lookahead  may  be  specified  as  a  parameter  to  the  parser 
generator  and  may  be  greater  than  cne,  although  we  shall 
consider  only  the  case  k-1.  Any  inadequate  states  that 
cannot  be  resolved  cause  the  program  tc  terminate  without 


continuing  further 
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After  any  inadequate  states  are  resolved,  reduce  states 
are  optimised  by  determining  the  state  in  which  tc  resume 
the  parse  after  making  the  production  from  the  stack  itself 
rather  than  from  "reading"  the  produced  nonterminal.  This 
would  normally  eliminate  the  need  to  retain  nonterminals. 
Our  recovary  algorithm,  however,  demands  that  we  know  which 
states  have  a  transition  under  a  nonterminal,  so  it  is  at 
this  point  that  the  parser  generator  has  been  amended  to 
retain  this  information. 


Another  significant  optimisation 
parser  generator  is  one  made  to  lockah 
lookahead  state,  the  "most  popular"  d 
chosen  and  becomes  the  destinaticn  s 
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b2 

an;y  other 
symbol 


s  2 

si 


Finally,  the  parser  generator  emits  a  set  of  tables  in 
the  XPL  language  [ McK  70].  An  extra  set  of  tables  has  been 
added  wnich  contains  the  nonterminal  information.  This 
information  is  in  the  form  of  two  XPL  vectors,  REC0VER1  and 
RECOV3R2.  There  is  one  entry  in  RECOVER  1  for  each  read 
state  of  the  CFSM.  RECOVERI(J)  (1<J<no.  of  read  states)  is 
the  index  into  RECOVSR2  of  further  information  on  state  J. 
For  J=1,2,...,r  where  r  is  the  number  of  read  states. 


RECOVER  1  (J  + 1 )  -  1  is 

the 

position  of  the 

last 

entry  in 

RECOVER 2  for  state 

since  information 

for 

successive 

states  is  stored  contiguously  in  RECOVER2.  If  the  last 
entry  is  less  than  RIC0VEB1{J) ,  then  there  are  no 
transitions  under  nonterminals  from  state  J  and  as  a  result, 
no  error  recovery  can  be  done  from  this  state.  If  the  last 
entry  is  greater  than  or  equal  to  RECCVER1  (J) ,  there  is  at 
least  one  transition  under  a  nonterminal  from  state  J  and 
error  recovery  can  be  done  from  this  state.  The  portion  of 
R ECO VSR 2  pertaining  to  state  J  is  constructed  as  follows: 
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the  parsec  generator.  The  recogniser  can  be  regarded  as  a 
program  which  maintains  the  state  stack  and  the  input  stream 
and  perforins  the  appropriate  transitions  from  state  to 
state. 

The  error  recovery  exists  as  a  set  of  XPL  procedures 
called  whenever  an  illegal  symbol  is  encountered  in  a  read 
state. 

4. 2 _ Algorithms 

Two  attempts  are  made  to  recover  from  syntax  errors 
encountered  during  the  parse  of  a  program. 

4.2.1 _ Spelling  Errors 

First  a  check  is  performed  to  see  if  the  error 
condition  is  probably  due  to  a  spelling  or  keypunch  error. 
This  is  done  only  if  the  grammar  contains  identifiers  and 
keywords  or  reserved  words.  A  keyword  is  assumed  to  be  some 
string  of  alphanumeric  characters  which  appears  as  a 
terminal  in  the  vocabulary;  for  example,  in  PL/I:  DO, 
PROCEDURE,  FIXED.  A  keyword  is  a  reserved  word  if  it  has 
the  above  property  and  may  only  appear  in  a  program  in  that 
context.  XPL  keywords  are  reserved  words;  PL/I  keywords  are 
not-  An  identifier  is  some  string  of  nonblank  alphameric 
characters  which  does  net  appear  in  the  grammar. 
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Recall  that  the  only  error  condition  which  can  occur  in 
the  optimised  version  of  an  LALR  parser  is  an  illegal  read 
symbol  in  a  read  state.  During  parsing,  we  can  suspect  a 
spelling  error  only  if  a  particular  keyword  is  required  at 
a  particular  point  of  the  parse,  but  we  have  instead  an 
identifier  or  another  keyword.  If,  then,  the  illegal  read 
symbol  is  such  an  identifier  or  keyword,  we  check  it  for  a 
spelling  error  against  all  keywords  under  which  there  is  a 
transition  from  the  current  state. 


Note  that. 

u  s  i  n  g 

this  criterion,  not 

all 

keyword 

spelling  errors 

will 

be  detected.  Consider, 

for 

example. 

the  DO  statement 

in  any 

PL/I-like  language.  If 

the 

keyword 

DO  is  messpelled. 

as  in 

SO 

WHILE  A>0 ; 

an  error  is  detected  when  the  WHILE  is  read  (because  SO  is 
incorrectly  stacked  as  an  identifier).  We  could,  then, 
check  the  top  symbol  in  the  stack  for  a  spelling  error. 
This  same  spelling  error  could  also  be  detected  during  the 
semantic  phase  of  compilation,  assuming  the  language 
requires  all  identifiers  to  be  declared  before  they  are 
used. 

Let  A  be  the  name  of  the  illegal  read  symbol  and  E  the 
name  of  one  such  keyword.  The  following  algorithm,  due  to 


Morgan  £Mor  70]  will  determine  whether  A  is  "close”  enough 
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to  B  to  pronably  be  a  spelling  error.  A  is  "close"  to  B  if 
A  has  (rexdtive  to  E)  one  extra  character,  one  missing 
character,  one  different  character,  or  two  adjacent 
characters  interchanged. 

1.  If  J length  (A) -length (B)  J>1,  then  return  failure 
(they  cannot  be  close  enough  if  the  lengths  of  the 
symbols  differ  by  more  than  one) . 

2.  If  length  (A) < length  (3)  ,  then  interchange  A  and  B. 

3.  Exclusively  or  together  A  and  B,  and  place  the 
result  into  TEMP.  (TEMP  is  now  a  string  the  length 
of  B,  with  zeros  in  those  positions,  and  only  in 
those  positions,  where  A  and  E  were  identical). 

4.  If  length  (A)  ^  length (E)  ,  gc  to  stop  B. 

5.  (lengths  egual)  Find  the  first  nonzero  character 
in  TEMP  and  call  its  position  J.  If  either  it  is 
the  last  character  in  TEMP,  or  all  following 
characters  in  TEMP  are  zero,  then  return  success 
(single  character  error) . 

6.  (check  permutation)  If  the  J+1'st  character  of 
TEMP  is  nonzero  and  the  rest  are  zero,  then  go  to 
step  7.  Otherwise,  return  failure. 

7.  Check  the  J*  th  and  J+I’st  characters  in  A  and  B. 
If  permutation,  then  return  success,  otherwise. 


return  failure. 
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4.2.2  Syntax  Errors 

If  the  illegal  read  symbol  cannot  be  corrected  by 
assuming  a  spelling  error,  we  must  apply  our  knowledge  of 
the  LALR  parsing  technique. 
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legitimately  be  proposed.  The  pattern  can  be  eas  ily 
changed. 

At  the  beginning  of  the  recovery  procedure,  the  next 
four  input  symbols  (after  the  illegal  one)  are  immediately 
read  in  without  checking.  This  is  dene  because  the  symbols 
may  need  to  be  "read”  more  than  once  if  the  search  order 
causes  Id  to  be  decreased  at  some  point.  Note  that  this 
device  is  not  required  using  the  above  searching  order;  it 
exists  for  the  sake  of  generality. 

After  sd  and  Id  have  been  determined,  recovery  is 
attempted  using  the  vectors  REC0VEH1  and  R EC0VER2.  We 
define  the  following  procedures,  arrays,  and  variables: 

STATE_STACK  (J)  is  the  J'th  state  in  the  stack 

representing  the  history  of  the  parse  to 
this  point. 

STACKPQINT  is  the  index  into  S?ATE_STACK  indicating 

which  state  we  are  presently 

considering. 

SYMBOLS (J)  is  the  J*th  symbol  in  the  input  stream 

(the  illegal  one  is  in  SYMBOLS (1)). 

LOOKAHEAD_SET  is  a  procedure  which  is  given  a  state 

and  STACKPOINT  and  returns  the  number  of 


legal  lookahead 


symbols  in  SET#  and 
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places  the  actual  symbols  in  an  array 
SET. 


Recovery  is  then  attempted  in  the  following  manner: 

STATE!  =  STATE_STACK (STACKPOINT) ; 

LIMIT  =  RECOVER"!  (STATE#+1)  -  1; 

DO  PAIRS  =  RECOVER  1 (STATE#)  TO  LIMIT  BY  2; 

SET#  =  LOOKAH£AD_S£T (RECOVEB2  (PAIRS+1) ,  STACKPCINT)  ; 

DO  J=1  TO  SET#; 

IF  SYMBOLS  (Id)  =  SET(J)  THEN 

recovery  is  possible  sc  perform 
recovery  and  return  to  the 
parsing  procedure; 

END; 

END; 
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the  search  limits,  then 
an  illegal  symbol  was 
parse  (that  is,  the 
ng  to  the  states  in  the 
symbols  from  the  input 
d.  The  input  stream  is 
ch  was  not  skipped  will 
popped  to  the  point 
t  symbol,  the  new  state 
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{in  RECOVE 32 (PAIR5+1) )  is 
programmer  is  given  the  new 
returns  to  the  parser. 


placed  on  the  top,  and  the 
partial  parse.  Finally,  control 


If  recovery  was  not  possible  w 
programmer  is  informed  that  an  i 
encountered  and  is  given  the  partial 
To  recover  from  the  error,  one  of  the 
have  to  be  exceeded.  This  means  that 
state  stack  would  be  eliminated,  and 
semantics  would  therefore  have  to 
several  symbols  from  the  input  stream 
ignored.  Considering  the  amount  of  p 
done  away  with  in  the  event  of  further 
it  is  likely  that  there  would  be  seriou 
two  desiderata  cf  error  recover 
introduction;  further  errors  could  well 
spurious  errors  would  likely  ensue.  To 
the  confusion  of  pages  of  incomprehen 
parsing  terminates,  and  the  remainder 
merely  listed. 


ithin  the  limits,  the 
llegal  symbol  was 
parse  to  that  point, 
search  limits  would 
several  states  on  the 


a  great  amount 

of 

be  "undone",  or 

that 

would  be  entirely 

rogram  which  would  be 

attempts  at  recovery. 

s  violations  of 

the 

y  mentioned  in 

the 

go  undetected. 

and 

spare  the  progra 

mmer 

sible  error  messa 

ges. 

of  the  program 

is 

Chapter  5. _ Some  Results 


The 

recogniser,  inc 

or porating 

the 

error 

recovery 

routines. 

was  used  to  parse 

programs 

in 

the  lan 

guage 

L2 

[Lei  70] 

generated  by  the 

grammar  G2 

(see 

fig.  5. 

1)  and 

in 

the  language  S?L  {see  appendix) . 


5.  1 _ Tlpe_Language_L2 

G2  is  a  grammar  devised  by  Leinius  in  his  thesis  as  a 
source  of  examples  during  his  explanation  of  error  recovery 


in  LR  (1) 

systems- 

tfe  have 

adopted  it  \ 

.ere  unchanged  for 

the 

sake  of 

coiBpa  ta  bil  ity.  As 

Leinius  did 

not  demonstrate 

the 

results 

on  any 

automatic 

recovery , 

we  cannot  compare 

any 

results 

with  his. 

However, 

as  pointed 

out  in  Chapter  2, 

the 

underlying  philosophy  is  virtually  the  same,  the  only 
difference  being  that  leinius  had  no  heuristics  for  the 
order  of  search. 


Correct  s 
or  uc (n) sghkb 
means  the  symb 
listing  (fig. 
First  we  shall 
and  then  make 
1.  xccs 


entences  in  L2  are  of 
( m) z ,  n , m>0 ,  where 
ol  is  repeated  n  t 
5. 2) ,  _ 1 _  is  the 

discuss  some  of  the 
seme  general  remarks, 
f ghbbb 


the  form  xc  (n)  sf ghb  (m)  y 
a  symbol  followed  by  (n) 
imes.  In  the  attached 
end  of  file  indication, 
sentences  individually, 


-35- 


1  SO  ->  s 


2 

S  -> 

xCD 

3 

S  -> 

UG 

4 

c  -> 

EsA 

5 

E  -> 

Ec 

6 

E  -> 

c 

7 

A  -> 

f  gh 

8 

D  -> 

*7 

9 

F  -> 

fcF 

10 

F  -> 

by 

1 1 

G  -> 

FsK 

12 

K  -> 

EL 

13 

B  -> 

ghk 

14 

A 

1 

bL 

15 

L  -> 

bz 

Fig,  5- 1  The  Grammar  G2 
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The  final  y  is  irissing.  The  recovery  replaced  the 
final  b  with  F,  which  is  precisely  equivalent  to 
aduing  a  y. 

2.  xcscfghbby 


The  s  has 

been 

incorrectly  interc 

hanged 

with 

the 

second 

c. 

Lo 

oking  at  the  com pie 

te  strin 

g,  we 

can 

see  that 

t  he 

optimal  recovery 

would 

be 

to 

eliminate 

the 

second  c.  Howe 

ver,  the 

recovery 

routine 

first 

tries  to  recover  1 

ocking 

only 

one 

symbol 

ahead , 

and  therefore  dec 

ides  to 

elimi na te 

the  s. 

Not 

being  the  correc 

t  recov 

ery. 

an 

avalanche  error  ensues  where  the  parser  wants  the 
3  it  just  got  rid  of.  To  recover,  it  skips  ahead 
to  the  first  b  and  replaces  E  with  C,  that  is,  it 
inserts  sA. 

4.  xccsfxhbbby 

An  x  was  typed  instead  of  g.  b  is  the  next  symbol 
in  the  input  string  which  is  also  the  first  symbol 
in  the  right  part  of  a  production.  Accordingly, 
xh  is  discarded  and  A  replaces  f,  i.e. ,  gh  is 
inserted  after  f. 

7.  xstghbbz 

A  c  is  missing  between  x  and  s.  Since  we  are 
working  on  the  production  S  ->  xCD  and  SO  ->  S , 
the  only  possible  recovery  would  be  to  skip  to  the 
end  of  the  program  and  produce  an  S.  This  would 
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nat  be  very  informative  to  the  programmer  and  is 
not  done  because  of  the  lookahead  limit.  See 
section  5.5.1  for  an  indication  of  how  to  reduce 
the  possibility  of  this  happening- 

10.  uccsghkhbbz 

There  is  an  extra  h  after  k.  The  h  was  correctly 
discarded. 

Of  a  sample  of  13  incorrect  sentences,  the  recovery 
performed  well  in  six  cases  {1,3,4,6,10,13),  performed 
passably  in  two  cases  (5,12),  caused  an  avalanche  error  in 
one  case  (2) ,  and  gave  up  because  a  search  limit  was  reached 
in  four  cases  (7,8,9,11).  We  claim  that  recovery  was  only 
passable  in  cases  5  and  12  because  although  parsing  could 
resume  without  any  difficulty,  the  number  of  discarded 
symbols  and  the  changes  in  the  parse  stack  would  present 
difficulties  to  the  compiler  writer  in  the  semantics  which 
would  have  to  be  undone. 

Recovery  seems  to  be  most  effective  when  the  illegal 
symbol  occurs  at  or  near  the  end  of  the  right  part  of  a 
production. 
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5^2 _ SPL 

Stanford *s  SPL  [  H&H  69] 
designed  for  teaching  beginning 
grammar  for  SPL  is  given  in  the 

An  analysis  was  performed 
by  programmers  of  SPL.  An  MSP  p 
check  the  syntax  of  1609  program 
in  programming  in  which  SPL 
errors  were  detected.  Figure  5. 
these  errors. 


is  a  dialect  of  PL/I 

and  is 

programming  students. 

The 

appendix. 

of  the  types  of  errors  made 
arser  [ McK  70]  was  used  to 
s  written  for  a  first  course 
was  the  language  used.  1408 
3  shows  the  distribution  of 


Within  the  category  of  il 
accounted  for  65.35%  of  all  syntax 
most  common  illegal  pairs  was 
figure  5.4. 


legal  symbol 
errors,  a  t 
compiled  an 


pair,  which 
able  of  the 
d  is  given  in 


On  inspecting  the  programs,  it  was  found  that,  as  would 
be  expected,  by  far  the  most  common  error  was  a  missing 
semi-colon;  this  error  produced  most  of  the  occurrences  of 
the  illegal  pairs  <constant>  <ident ifier>,  <identifier> 
<ident if ier>,  <statement>  ELSE,  and  )  < ident if ier>.  The 
pairs  <identi£ier>  <constant>  and  )  <identifier>  were  in 
part  due  to  missing  operators  The  pair 
<assignment  statement>  TO  stemmed  from  errors  of  the  form 


DO;  K=1  TO  12;. 
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Error  Type 

Frequency 

% 

illegal  symbol  pair 

920 

6  5.35 

no  production  applicable 

214 

15.21 

end  of  program  at  invalid  point 

169 

12.  00 

illegal  character 

77 

5.  47 

something  wrong  with  number 

1 

.07 

others 

_ 27 

1.92 

total 

1408 

100. 00 

Fig.  5.3  Errors  Detected  in  SPL  Sample 


Illegal  Pair 

Frequency 

% 

<identifier>  <identifier> 

119 

12.93 

<constant>  <identifier> 

100 

10.86 

<identi£ier>  <constant> 

51 

5.54 

<statemant>  ELSE 

46 

5.00 

;  <constant> 

42 

4.56 

<subscript>  ; 

39 

4.23 

—  « 

» 

31 

3.36 

<constant>  <constant> 

29 

3.  15 

;  ( 

27 

2.93 

)  < idea t if ier> 

24 

2-60 

<assignment  statement>  TO 

23 

2.50 

otner 

389 

42.  34 

tot  al 

920 

100.00 

^ig.  5.4 

The  Most  Common  Illegal  Pairs 
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tfe  do  uot  claim  that  these  errors  are  representative  of 
those  generally  made  in  a  higher-level  programming  language; 
only  that  these  were  the  errors  made  in  SPL  by  some 
beginning  students. 

The  same  programs  were  then  parsed  using  the  LALH 
parser  incorporating  our  error  recovery  routines.  Figure 
5.5  shows  a  representative  selection  of  the  results.  Once 
again,  wa  shall  first  discuss  some  of  the  examples 
individually. 

1.  This  example  demonstrates  a  common  problem 

encountered  in  parsing  SPL  programs.  If  the  error 
occurs  at  the  beginning  of  a  DO  group  or  EEGIN 
block,  then  in  the  process  of  recovering,  the 
necessary  information  which  resides  in  the  stack 
is  lost.  Consequently,  cn  encountering  the 
cor  responding  END  statement,  another  error  occurs. 
Tne  only  suggestion  we  can  offer  as  to  how  this 
can  be  avoided  is  to  specif icia lly  prevent  the 
recovery  routine  from  replacing  one  nonterminal 
with  anything  but  a  certain  other  nonterminal,  in 
this  case,  <control  clause>  with  <group  head>. 

2.  The  effect  of  this  recovery  is  to  place  a  semi¬ 
colon  after  the  END. 

4.  Note  that  a  <while  list>  can  be  a  DO  and 

consequently  the  correct  recovery  was  performed. 
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5. 


9. 


10. 


13. 

17. 


h.  <statement>  is  an 
followed  by  ;. 

The  recovery  seemed  deter 


assignment 

statement 

of 

managed 

to 

do  sc,  and 

the 

loss  of 

the 

<group  hea 

CL 

V 

t 

example 

1  are  applicab 

le 

<Dlock> 

is 

one  of  the 

ma 

have 

been 

used  here 

f 

reason  it 

was  chosen  in 

F 

i. 

that 

it  was  the 

Q 

<assi 

gn  ment. 

s 

tatement> 

mined 

to  mak 

e  a 

multi 

pie 

the 

offend 

ing 

line. 

It 

only 

damage 

don 

e  was 

the 

The 

remar 

ks 

made 

for 

here. 

ny  symbols  which  could 
or  replacement.  The  only 
reference  to  the  others 
hottest  nonterminal  (see 


section  5.1). 

Note  the  correction  of  a  spelling  error. 

The  error  was  eventually  corrected,  but  not  before 
an  avalanche  error  was  generated.  The  semi-cclon 
should  have  been  immediately  discarded.  This  is 
au  example  of  a  situation  where  our  recovery 
method's  attempts  to  minimise  the  number  of  input 
symbols  discarded  had  an  unfortunate  consequence. 


Recovery 
<declar ation 
of  the  restri 
statements, 
or  block  crea 
end  of  one 


was  consistently  good  if  the  error  was  in  a 
statement>  (examples  3,6,7),  probably  because 
cted  number  of  symbols  which  are  legal  in  these 
Errors  encountered  at  the  beginning  of  a  group 
ted  difficulties  but  those  encountered  at  the 
(no  semi-colcn  after  the  END  statement,  for 
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instance)  were  handled  correctly.  Most  cases  where  recovery 
was  abandoned  occurred  when  a  consilient  appearing  at  the 
beginning  of  a  program  was  not  correctly  closed,  making  the 
complete  program  into  a  comment.  Once  again,  the  greatest 
problem  is  that  of  the  compiler  writer:  what  to  do  when 
drastic  changes  are  made  to  the  stack. 

The  following  table  shows  the  lookahead  depth  Id  and 
depth  into  the  stack  sd  used  to  effect  recovery  for  the 
errors  from  a  subset  of  the  programs.  99  of  the  programs 
parsed  contained  at  least  one  error,  and  313  errors  were 
encountered. 

Id 


93  (29.7%)  of  the  recovery  attempts  were  avalanche  errors, 
although  24  of  these  were  caused  by  three  cases  of  "reversed 
programs'1,  i.e.,  a  missing  quote  mark  or  an  incorrectly 
formed  begi u- comment  mark.  In  most  cases,  only  one 
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avalanche  error  was  produced  by  an  incorrect  recovery 
attempt.  Recovery  was  abandoned  in  18  cases  (5-8%). 
Surprisingly,  only  three  errors  were  due  to  a  keyword  being 
misspelled.  Two  were  corrected,  while  the  third  was  not, 
because  the  keyword  was  stacked  as  an  <identifier>  before  an 
error  was  detected. 

5.3  Space  Requirements 

The  following  table  gives  the  requirements  for  the 
extra  tables  RECOVERl  and  REC0VER2  for  grammars  G2  and  SPL. 


#  of 

#  of  states 

size  (bytes)  of 

size  (bytes)  of 

productions 

tables  ether 

recovery  tables 

than  recovery 

G2 

1b 

31 

178 

42 

SPL 

132 

303 

3746 

2622 

It  should  be  noted  that  whereas  the  size  of  the 
elements  of  most  of  the  tables  was  BIT  (8),  the  recovery 
tables  for  SPL  had  to  be  BIT  (16),  thus  using  twice  the 
amount  of  space  they  could  have  otherwise  occupied.  If  the 
nonterminal  information  and  state  information,  now  both  in 
REC0VER2,  were  split  into  separate  and  parallel  tables,  the 
former  could  be  BIT  (8),  and  the  table  size  would  drop  from 


2622  to  2029. 
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J5-.  4 _ Comparisons 


In 

this  section  we  shall 

pres 

ent 

comparisons 

of 

our 

method 

with  two  other  recov 

er  y 

methods  applied 

to 

SPL 

program 

s.  Both  methods  were  te 

s  ted 

on 

the  same  set 

of 

SPL 

programs  as  oar  method.  The  two  methods  are: 

1.  an  MSP  method,  which  is  an  error  recovery  method 
designed  specifically  for  SPL  and  operates  within 
an  MSP  parser.  When  an  error  is  encountered, 
either  illegal  symbol  pair  or  no  production 
applicable , 

a)  the  parse  stack  is  popped  until  one  of  a 

small  list  of  symbols  (<statement> , 

<statement  list>,  <group  head>,  Cblock  head>, 
etc.)  lies  on  the  top  of  the  stack. 

b)  symbols  in  the  input  stream  are  skipped  until 


the  beginning 

of 

a 

new  statement 

is 

encountered ; 

this 

occurs  when  either 

a 

keyword  which 

must 

beg 

in  a  statement  or 

a 

semi-colon  is 

found 

Allowance  is  made 

for  avoi 

ding  reading  an  ELSE 

if 

the  THEN  part  of  an 

<i£ 

c 

lause>  may  have  been 

aiirainated.  The 

parse 

c 

an  always  resume  after 

recovery  is  complete,  unless  the  end  of  file  is 
encountered  during  recovery. 

2.  a  method  which  we  shall  call  LAL'R  method  I,  which 
uses  the  same  parsing  tables  as 


our  method 
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(referred  to  in  this  section  on] y  as  LALR  method 
II)  and  which  is  language-independe  nt ,  i.e. 
syntax-directed.  However,  it  does  not  use  the 
vectors  RECOVER  1  and  PKJGVES2.  Its  strategy  is  to 
scan  down  the  parse  stack  until  a  state  is 
encountered  which  can  read  the  next  input  symbol. 
If  no  such  state  is  found,  the  input  symbol  is 
discarded.  This  process  is  repeated  until  a  state 
is  found  which  can  read  a  symbol  from  the  input 
stream. 


The  following  table  and  figures  5.6,  5.7,  and  5. 8 
summarize  the  results  obtained.  Figure  5.8  is  essentially 
the  same  as  the  table  on  page  59, 


ns? 

LALR  I 

LALR  II 

errors  detected 

2  14 

3  5  2 

313 

avalanche  errors 

33 (17. $%) 

151  (4  2. 9%) 

9  3  (29.7%) 

avalanches  caused  by 

reversed  programs 

2 

81 

24 

abandondei  recovery 

32(15. 0%) 

16(4.  5%) 

18 (5. 8%) 

net  errors 

176 

201 

220 

average  no,  input 

symbols  skipped 

2.  86 

0.68 

0.62 

average  no.  stack 

elements  popped 

1.96 

1.01 

1.18 
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input 

Q 

symbols 

1  2 

sk ipped 

3 

4 

5 

>5 

totals 

stack 

0 

6 

1 

2 

1 

10 

elements 
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63 

17 

O 

2 

1  1 

4 

1 1 

104 

popped 

2 

2 

1 

1 

5 

9 

3 

3 

3 

6 

1 

1 

6 

20 
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2 

1 

1 
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5 

a 

1 

0 

17 

>5 

6 

1 

2 

9 

t  otal s 

8  4 

26 

4 

1  0 

1c> 

7 

36 

182 

F  i  y  .  5,6 

Symbols  Deleted  rising  MSP  Method 


input  symbols  skipped 


Symbols  Deleted  Using  LA LB  Method  IT 
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The  raw  labelled  ’’net  errors”  in  the  above  table  gives 
the  number  of  errors  detected  which  were  not  avalanche 
errors,  i.e.,  the  number  of  actual  syntax  errors  detected. 
The  MS?  method  detected  the  fewest  of  these,  a  result  which 
is  not  surprising  in  view  of  both  the  number  of  input 
symbols  it  consistently  skipped  over  and  the  number  of  times 
it  simply  gave  up.  LA  LI?  method  I  always  deleted  at  least 
one  stack  symbol  or  one  input  symbol,  whereas  LALR  method 
II,  in  almost  half  of  its  attempts,  effected  recovery  by 
merely  roplac i ncj ,  not  deleting,  a  stack  element  and  not 
skipping  any  input  symbols;  this  perhaps  explains  why  LALR 
method  II  detected  the  most  errors.  Nor  unexpectedly,  LALR 
I  performed  best  when  an  extra  symbol  had  to  be  deleted, 
but,  as  occurs  in  the  majority  of  cases  (see  sec.  5.2),  when 
a  symbol  was  missing,  it  tended  to  be  as  bad  as  the  MSP 
method  in  deleting  stack  elements. 

In  the  final  row,  the  average  number  of  stack  elements 
'’popped”  for  LALR  method  II  is  slightly  greater  than  that 
for  LALR  method  I  because  at  least  one  stack  element  must 
always  be  replaced  (not  pepped)  using  the  former  method.  In 
fact,  in  some  cases,  the  stack  element  was  "replaced”  by 
itself  or  by  something  almost  identical  to  itself.  If 
allowance  were  made  for  this  fact,  or  if  the  suggestion  made 
in  the  first  part  ol  section  6.1  were  carried  out,  the 


figure  of  1.18  would  probably  drop  below  the  figure  for  LALR 
method  I. 


The  figures  for  avalanche  errors  caused  by  reversed 
programs  differ  because  the  three  methods  have  different 
criteria  for  abandoning  compilation:  the  dSP  method  gives 
up  rather  easily,  LALR  I  continues  until  a  certain  number 
(25)  of  errors  have  been  generated,  and  LALR  IT  quits  when 
a  search  limit  is  reached.  Taking  this  into  consideration, 
then,  we  see  that  the  two  LAIR  methods  generate 
approximately  the  same  number  of  cases  involving  avalanche 
errors.  It  should  be  noted  that  LALR  method  T  (and,  of 
course,  the  i'iSP  method)  would  tend  to  produce  more  avalanche 
errors  in  the  semantics  phase  of  compilation  than  LAI.S  II 
because  of  their  tendency  to  delete  declaration  statements 
containing  syntax  errors. 

Our  LALR  method  II,  then,  would  seem  to  be  the  best  of 
the  three  if  only  because  it  detects  the  most  errors.  It 
has  the  added  advantages,  however,  of  reducing  the  potential 
number  of  semantic  compilation  errors  and  reducing  (as 
compared  to  LALR  I)  the  number  of  avalanche  errors. 

5. 5 _ Problems 

’The  recovery  algorithm  clearly  dees  not  always  perform 
the  "best"  correction.  We  shall  new  attempt  to  give  some 
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reasons  why  this  is  so  and  what  may  be  done  to  improve  its 
performance.  In  most  cases  the  improvements  force  us  to 
leave  the  realm  of  language  independence  and  make 
concessions  to  features  of  a  specific  language.  For  further 
language-independent  improvements,  see  section  6.1. 

5^2^  J _ _ 1 

Tae  rookahead  search  limit  and  limit  of  search  down  the 
stack  may  both  be  reached  without  finding  a  satisfactory 
recovery-  In  this  case  the  resulting  action  (the  recovery 
algorithm  gives  up  and  parsing  terminates)  occurs  more 
frequently  than  can  be  tolerated.  Two  remedies  are  proposed 
for  this  problem. 

Firstly  the  right  part  of  certain  productions  may  be 
unfortunately  constructed  for  our  purposes.  By  this  we  mean 
that  the  production  may  be  of  the  form  A  ->  ^1/3  where  is 
some  (perhaps  null)  string  of  nonterminals  and  /?  is  a  non¬ 
null  string  of  nonterminals.  Tf  the  error  occurs  just  after 
the  terminal  b,  than  the  recovery  routine  may  be  forced  to 
look  for  the  symbol  which  follows  yd .  This  could  involve 
looking  far  ahead  in  the  program;  farther  than  the  lookahead 
limit  may  allow.  To  remedy  this  the  production  can  be  split 
into  three  productions,  namely 

A  ->  A' A" 


A*  ->  °<b 
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A»  ->  (3 

This  problem  manifested  itself  in  example  7  of  language  L2. 
The  offending  production  was  number  2:  S  ->  xCD  which  could 
be  changed  to 

S  ->  S'S" 

S'  ->  x 
S"  ->  CD 

In  making  this  type  of  change  we  must  keep  in  mind  that  the 
size  of  the  parsing  tables  is  potentially  increased. 

The  otiier  remedy,  of  course,  is  to  increase  one  or  both 
of  the  search  limits.  This  is  not  recommended,  as  it 
defeats  one  of  our  goals  in  error  recovery:  that  as  many 
syntax  errors  as  possible  should  be  detected  during  a 
compilation. 

5. 5. 2 _ Ptob lem_ 2 

The  choice  of  nonterminals  may  impair  the  recovery. 
For  example,  in  SPL,  the  statement 
IF  a=b  GO  TO  label; 

is  missing  a  THEN  before  GO  TO.  Or.  encountering  the  error, 
the  recovery  routine  will  alter  <if  clause>  to  <statement>, 
thereby  nullifying  the  associated  semantic  actions  and 
creating  a  potential  avalanche  if  an  ELSE  clause  follows. 
If,  however  the  productions 
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<condrtional  stateraent>  ->  <if  clause>  THEN  <statement> 
<conditional  statement>  ->  <if  clause>  THEN 

<tasic  statements  ELSE 
< statements 

were  changed  to 

<conditional  statement>  ->  <if  then>  <statoroent> 
<conditional  statement>  ->  <if  then>  <hasic  statement> 

ELSE  <statement> 

and  the  production 

<if  then>  ->  <if  clause>  THEN 

was  added,  <if  then>  would  replace  <if  clause>  and  neither 
of  these  difficulties  would  occur. 

At  this  stage  we  should  note  an  objection  which  cculd 
be  made  to  these  heuristics:  that  the  grammar  should  not 
have  to  be  compromised  to  improve  error  recovery.  The 
grammar  already  has  to  be  altered  for  other  reasons. 
Ambiguities  must  be  removed  and  productions  must  be  changed 


for  it  to 

conform  to 

the  requirement 

s  of 

the  class  of 

grammar  s 

(in  our  case 

,  LALR ( 1 ) ) . 

Then 

change 

s  must  be  made 

so  that 

prod  uctions 

are  performed 

when 

corresponding 

semantic 

actions  are 

required. 

We 

wou  Id 

like  to  avoid 

forcing  the  compiler 

writer  to 

make 

even 

more  changes. 

However,  if  our  error  recovery  system  performs  poorly  for 
his  language,  and  if  even  the  extensions  proposed  in  section 
6.1  do  not  help  him,  he  may  be  compelled  to  do  so  tc  bring 


recover  y 


to  a  sufficiently  high  standard. 


For  the  language 


SPL,  the 
opinion , 


extensions  in  the  next  chapter  would,  in 
be  entirely  sufficient  for  consistently  good 


our 

error 


recover  y 
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Chapter  6. _ Conclusions 

6, 1 _ Extensions 

We  now  propose  some  extensions  to  the  error  recovery 
heuristics  which  would  make  the  recovery  system  more  useful 
in  an  actual  compiler. 

There  may  not  be  a  unique  correction  for  a  given  state 
and  a  given  lookahead  symbol-  That  is,  for  some  state  J  and 
some  b  in  Vt,  REC0VER1 (J)  may  point  to  more  than  one  pair 
in  RECOVEL2  which  may  "read”  b.  As  it  happens,  the 
nonterminals  in  RECOVER 2  are  ordered  first  by  length  of 
their  names  and  then,  for  those  of  the  same  length,  in  the 
standard  EBCDIC  collating  sequence.  Thus  < x>  would  precede 
<ab>  which  would  precede  <ac>,  etc.  The  search  for  a 
nonterminal  which  can  be  placed  on  the  stack  proceeds  in  the 
same  order,  and  thus  shorter  nonterminals  are  chosen  in 
preference  to  longer  nonterminals. 

For  example,  given  the  SPL  statement  SUM  I  =  A  +  B;  an 
error  will  occur  when  the  <identifier>  T  is  encountered. 
The  <id ent if ie r >  SUM  will  be  replaced  by  <go  to>  and  an 
avalanche  error  will  ensue  when  the  ■=  is  read.  Looking  at 
other  examples,  it  would  seem  that  <go  to>  is  very  seldom 
the  optimal  nonterminal  for  recovery.  In  this  case,  if 


<go  to>  ware  not  used,  SUM  would  be  replaced  by  <statement>. 
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a  much  more  reasonable  correction  and  one  which  would  not 
produce  spurious  errors. 

The  first  extension  which  could  te  carried  out  would  be 
to  first  check  if  one  of  the  nonterminals  which  could  be 
used  for  recovery  is  the  same  as  that  which  corresponds  to 
the  given  state;  this  nonterminal  would  be  chosen  in 
preference  to  the  others.  (This  extension  is  the  one 
referred  to  in  section  5.4.) 

Another  way  to  correct  this  would  be  to  allow  the 
compiler  writer  to  reorder  the  entries  in  RECOVER 2  during 
the  table  generation  process.  In  the  case  of  our  example. 


he 

could 

assi 

gn  a  low 

priority 

to 

th 

e  nonterminal 

<go 

to>. 

and 

the 

pair 

for  this 

nontermin 

al 

wo 

uld  be 

placed 

at  or 

near 

the 

end 

of  th 

e  information  for 

each 

state. 

These 

prior 

i  ties 

would  have  to  be  obtained  from  previous  experience  with 
programs  written  in  the  language. 

This  procedure  could  be  automated  by  altering  the 
recogniser  to  allow  it  to  dynamically  reorder  entries  in 
FEC0VER2  as  it  parses-  If  an  error  occurs  in  parsing  and  a 
substitution  is  made  which  almost  immediately  produces 
another  error,  we  could  move  the  pair  which  was  used  in  the 
first  error  to  the  end  of  (or  at  least  farther  along  in)  the 
entries  for  the  appropriate  state.  Eventually  the  tables 
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should  stabilise  to  a  point  where  the  number  of  spurious 
errors  generated  is  minimised.  Alternat ively,  we  could 
effect  the  reording  by  dynamically  collecting  statistics  on 
the  frequency  of  constructs  in  correct  programs.  The  more 
common  a  construct,  the  mere  likely  it  is  to  be  a  good  one 
for  recovery,  and  entries  in  RFCCVER2  could  be  reordered 
accord ingiy. 


When  an  error  is  encountered  which  cannot  be  recovered 
from  within  the  search  limits,  parsing  terminates.  Rather 
than  giving  up,  though,  we  could  at  this  point  invoke  some 
language-dependent  recovery  heuristic  in  an  attempt  to 
continue  the  parse.  If  such  a  situation  were  to  arise  in 
parsing  an  SPL  program,  for  example,  we  could  scan  ahead  to 
the  next  semi-colon  and  pop  the  stack  until  we  find  a  state 
which  can  read  that  symbol.  Standard  recovery  techniques 
already  exist  or  can  easily  be  devised  for  most  programming 
languages.  However,  if  general  rules  can  be  given  for  this 
type  of  error  recovery,  we  should  be  able  to  automate  it, 
and,  in  fact,  it  is  not  difficult  to  infer  from  the  grammar 
for  SPL  that  the  semi-cclon  is  a  ’'high-level”  punctuator. 
We  could,  then,  alter  the  parser  generator  to  select  all 
high-level  terminals,  or  "hard  tokens”,  and  output  the 
relevant  information  in  the  form  of  tables;  the  recogniser 
could  use  these  as  a  last-ditch  effort  to  recover  from  a 
particularly  troublesome  error. 
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6,2  Conclusions 

One  problem  which  we  have  pointed  out  from  time  to  time 
from  which  this  method  suffers  is  the  change  in  semantics 
which  each  alteration  to  the  stack  necessitates.  Although 
we  have  suggested  some  ways  in  which  these  alterations  may 
in  some  cases  be  minimised,  no  general  method  for  handling 
this  problem  from  the  point  of  view  of  the  semantics  has 
been  offered;  we  leave  this  for  the  compiler  writer,  merely 
noting  here  that  in  this  respect  the  recovery  scheme  of 
Gries,  described  in  Chapter  3,  is  admittedly  superior. 


Inclusion  of  the  extensions  suggested  in  sec 
will  significantly  improve  recovery  and  the  ad 
’’fine  tuning"  -  specific  changes  to  the  recovery 
directed  towards  the  characteristics  of  a  certain 
or  changes  to  the  grammar  such  as  those  proposed  in 
5,5  -  should  elevate  it  to  the  level  of  a  re 

recovery  system.  We  therefore  claim  that  it  would 
worth  including  the  error  recovery  scheme  describe 
thesis  in  a  syntax-directed  compiler  writing  system 
languag  es. 
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APPENDIX 


AN  SPL  GRAMMAR 


1  <program>  ::=  <statement  list> 

2  <statement  list>  ::=  <statement> 

3  ]  <statement  list>  <statement> 

4  <statement>  ::=  <basic  statement) 

5  1  <conditional  statement) 

6  <basic  stateraent>  :z-  <assignment  statement)  ; 


7 

]  <go  to  state men t >  ; 

8 

j  <return  statements  ; 

9 

|  <call  statement>  ; 

1  0 

1  <declaration  statements  ; 

1 1 

|  <block>  ; 

12 

1  <group>  ; 

13 

1  <procedure  definitioiS  ; 

14 

|  ^allocation  statements  ; 

15 

J  <free  statements  ; 

16 

!  ; 

17 

|  <label  definition> 

<basic  statement) 

IS 

<coiiIjl  t  ional 

statements  : : -  <if  clause>  THEN 

<s t at emen t> 

19 

!  <if  clause)  THEN 

<basio  statement>  ELSE 
<s tatemen t  > 

20 

|  Clabel  definition) 

<conditional  statement) 

21 

<i£  :lause> 

IF  <expression> 

22 

CblockS  : :  = 

< block  head)  < statement  list)  <ending) 

23 

CblocK  head) 

PEGIN  ; 

24 

<group>  : 

<group  head)  <statement  list)  <ending) 

25 

<label  definition)  ::=  Cidentifier)  : 

26 

<endiag>  = 

END 

27 

1 

END  <identifier> 

28 

1 

< label  definition)  <ending> 

29 

<procedure  definition)  ::=  <procedure  head) 

<statement  list>  <ending> 

30 

<procedure  head)  z:~  <label  definition)  PROCEDURE 

<fcrmal  parameters) 
<proc  attribute>  ; 
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43 
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52 
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59 
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<proc  attribute>  :: 


|  <type> 


<£ormal  parameters> 

|  (  <parameter  list>  ) 

<paraiaeter  list>  ::=  <identifier> 

|  <parameter  list>  ,  <identifier> 


<attribute  list>  dttribute> 

1  <att.ribute  list>  <att.ribute> 

<attnbute>  ;;=  (  dstorisk  Iist>  ) 

|  <type> 

|  ENTRY 
|  LABEL 

declaration  statement>  DECLARE  declaration  list> 


<dacl ara t ion  list>  declaration  element> 

j  <declaration  list>  , 
declaration  eleraont> 


<declara t ion  eleinent> 


declaration  priraary> 


<declaration  primary> 
declaration  primary> 
<at tribute  list> 

<iden  t if ier> 

(  declaration  list>  ) 


dstar-i.sk  list>  * 

j  dsterisk  lisi>  ,  * 

<type>  : : =  BIT 

]  FIXED 
1  FLOAT 
|  CHARACTER 


dssijniaent  statement>  <varialle>  =  <expressicn> 

|  <variable>  , 

dssigniiient  statemant> 

<go  to  statement>  <go  to>  <variable> 

<go  to>  : :=  GO  TO 
J  GOTO 


<return  statenent>  ;:=  RETURN 

J  RETURN  <ex press ion > 

<cali  stateraent>  ::=  CALL  <variable> 

dliacation  statement>  ALLOCATE  dilocation  list> 

<alIocation  list>  dilocation  element> 
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67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 


|  Allocation  list>  , 

Allocation  element> 

Allocation  element>  ::=  <identifier>  {  Cbounds  list> 

) 

<bounds  list>  ::=  <bounds> 

}  <bounds  list>  ,  <bounds> 

<boands>  <express.ion> 

j  <expression>  :  <expression> 

<free  statement>  FREE  <fuce  list> 

<fraa  iist>  <ident if ier> 

|  <free  list>  ,  <identifier> 

<groap  head>  ::=  <control  clause>  ; 

<coutrol  clause>  ::=  <repetiticn  control> 

|  Aontrol  clause>  <case  selector> 

<cas3  selector>  Ci\SE  Arithmetic  expression> 

<repetiticn  contro3>  =  <whilo  list> 

|  ^iteration  list> 

<while  claus(>>  WHILE  <expression> 

<while  list>  ::=  DO 

|  <while  list>  Ahile  clause> 

<iteration  list>  <do  variable>  = 

Citeration  element> 

|  <i ter at ion  list>  , 

-alteration  element> 

<do  variable>  DO  <variahle> 

<itaration  eleraent>  ::=  <iteration  head> 

<i ncremen tat  ion  ccntrol> 

]  <i ter at ion  element > 

<while  clause> 

<iteration  head>  <expression> 

^incrementation  ccntrol>  TO 

Arithmetic  expression> 

1  EY 

<arithmetic  expression> 

|  TO 

<arithmetic  expression> 
EY 

Arithmetic  expression> 

|  EY 

Arithmetic  expression> 


94 

95 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

111 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 

127 

128 

129 

130 


TO 

<arithmetic  expression 

1 

<expcession>  ::=  <logical  factoi> 

1  <expression>  j  <logical  factor> 

<logical  factor>  ::=  <logical  secondary> 

|  Clogical  fact.or>  S 
<logical  secondary> 

Clogical  secondary>  ::=  <lcgical  primary> 

1  ->  <loqical  secondary> 

<logicai  priraary>  <st.ring  expression 

|  <string  expression>  <relation> 
<string  expression 

<relation>  ::=  < 

J  > 

|  <  = 

1  >  = 

I  -  < 

1  -  > 


<string  expression  <arithinetic  expression 

|  <string  expression  j  ] 

<ar i thmetic  expression> 

<arithmetic  expression>  <term> 

|  <arith;netic  expression 

<  t.  e  r  m  > 

|  <arithaietic  expression 

<  t  e  r  m  > 

<ten>  ::=  <factor> 

|  <tera>  *  <factor> 

J  <term>  /  < fact or > 

<factor>  z  <priraary> 

|  <primary>  **  <factor> 

|  +  <factor> 

|  -  <factor> 

<primaty>  <constant> 

|  <variable> 

J  (  Expression  ) 

1  <type>  (  Expression  ) 

<variabie>  zz-  <ident if ier> 

|  <identifier>  (  <subscr ipt  list>  ) 

Eubscr  ipt  list>  zz  =  <subscript> 

J  <subscri pt  1 ist > 


t 


<suhscript> 


131 

132 


<sul)S3i  i  pt>  : 


:=  <ex pre^sion> 

I  * 


